training input
Derivation of effective gradient flow equations and dynamical truncation of training data in Deep Learning
We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.
Learning to Compile Programs to Neural Networks
Weber, Logan, Michel, Jesse, Renda, Alex, Carbin, Michael
A $\textit{neural surrogate of a program}$ is a neural network that mimics the behavior of a program. Researchers have used these neural surrogates to automatically tune program inputs, adapt programs to new settings, and accelerate computations. Researchers traditionally develop neural surrogates by training on input-output examples from a single program. Alternatively, language models trained on a large dataset including many programs can consume program text, to act as a neural surrogate. Using a language model to both generate a surrogate and act as a surrogate, however, leading to a trade-off between resource consumption and accuracy. We present $\textit{neural surrogate compilation}$, a technique for producing neural surrogates directly from program text without coupling neural surrogate generation and execution. We implement neural surrogate compilers using hypernetworks trained on a dataset of C programs and find that they produce neural surrogates that are $1.9$-$9.5\times$ as data-efficient, produce visual results that are $1.0$-$1.3\times$ more similar to ground truth, and train in $4.3$-$7.3\times$ fewer epochs than neural surrogates trained from scratch.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Monaco (0.04)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Coding schemes in neural networks learning classification tasks
van Meegen, Alexander, Sompolinsky, Haim
Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers
Chen, Thomas, Ewald, Patricia Muñoz
In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension $Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training input size $N$ can be arbitrarily large - thus, we are considering the underparametrized regime. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function. Our constructions make no use of gradient descent algorithms at all.
- North America > United States > Texas > Travis County > Austin (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Mendata: A Framework to Purify Manipulated Training Data
Huang, Zonghao, Gong, Neil, Reiter, Michael K.
Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs are clean, Mendata perturbs the training inputs so that they retain their utility but are distributed similarly (as measured by Wasserstein distance) to the reference data, thereby eliminating hidden properties from the learned model. A key challenge is how to find such perturbations, which we address by formulating a min-max optimization problem and developing a two-step method to iteratively solve it. We demonstrate the effectiveness of Mendata by applying it to defeat state-of-the-art data poisoning and data tracing techniques.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Nepal (0.04)
Imputation using training labels and classification via label imputation
Nguyen, Thu, Halvorsen, Pål, Riegler, Michael A.
Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization
Chen, Thomas, Ewald, Patricia Muñoz
In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$. We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of training input vectors belonging to the same output vector $y_j$, $j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}}$, $j=1,\dots,Q$. We comment on the characterization of the global minimum of the cost function in the given context.
- North America > United States > Texas > Travis County > Austin (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Kernel Learning for Explainable Climate Science
Lalchand, Vidhi, Tazi, Kenza, Cheema, Talay M., Turner, Richard E., Hosking, Scott
The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.
- Asia > Pakistan > Arabian Sea (0.25)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- Asia > China (0.05)
- (2 more...)
REaaS: Enabling Adversarially Robust Downstream Classifiers via Robust Encoder as a Service
Qu, Wenjie, Jia, Jinyuan, Gong, Neil Zhenqiang
Abstract--Encoder as a service is an emerging cloud service. A larger certified radius indicates better certified robustness against adversarial examples. In general, there are two categories of complementary methods to build a certifiably robust classifier and derive In an encoder as a service, a service provider (e.g., OpenAI, its certified radius for a testing input, i.e., base classifier Google, and Amazon) pre-trains a general-purpose feature (BC) based certification [7], [8], [9], [10] and smoothed extractor (called encoder) and deploys it as a cloud service; classifier (SC) based certification (also known as randomized and a client queries the cloud service APIs for the feature smoothing) [11], [12], [13]. BC based certification aims to vectors of its training/testing inputs when training/testing a directly derive the certified radius of a given classifier (called downstream classifier. For instance, the encoder could be pretrained base classifier) for a testing input. BC based certification using supervised learning on a large amount of labeled requires white-box access to the base classifier as it often data or self-supervised learning [1], [2] on a large amount of requires propagating the perturbation from the input layer to unlabeled data. A client could be a smartphone, IoT device, the output layer of the base classifier layer by layer. SC based self-driving car, or edge device in the era of edge computing. In the Standard Encoder as a Service (SEaaS), the smoothed classifier for the testing input. To increase the testing service provides a single API (called Feature-API) for clients inputs' certified radii, SC based certification often requires Wenjie Qu performed this research when he was an intern in Gong's group. Our input-space certified radius R guarantees the certification. However, the client does not have white-box client's base or smoothed downstream classifier predicts the access to the encoder deployed on the cloud server, making same label for the testing input if the l The second challenge perturbation added to the testing input is less than R. is that, although a client can use SC based certification by treating the composition of the encoder and its downstream The key challenge of implementing our F2IPerturb-API is classifier as a base classifier, it incurs a large communication how to find the largest input-space certified radius R for a cost for the client and a large computation cost for the cloud given testing input and its feature-space certified radius R Therefore, the client requires e queries to the Feature-API per training input, problem is challenging to solve due to the highly non-linear where e is the number of epochs used to train the downstream constraint.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Middle East > Jordan (0.04)
Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors
Detommaso, Gianluca, Gasparin, Alberto, Wilson, Andrew, Archambeau, Cedric
As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on the inputs through a measure of their distance from the training set. DAP calibration is agnostic to the posterior inference method, and it can be performed as a post-processing step. We demonstrate its effectiveness against several baselines in a variety of classification and regression problems, including benchmarks designed to test the quality of predictive distributions away from the data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)